An Extensive Empirical Study of Collocation Extraction Methods

نویسنده

  • Pavel Pecina
چکیده

This paper presents a status quo of an ongoing research study of collocations – an essential linguistic phenomenon having a wide spectrum of applications in the field of natural language processing. The core of the work is an empirical evaluation of a comprehensive list of automatic collocation extraction methods using precision-recall measures and a proposal of a new approach integrating multiple basic methods and statistical classification. We demonstrate that combining multiple independent techniques leads to a significant performance improvement in comparisonwith individualbasic methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Significance tests for the evaluation of ranking methods

This paper presents a statistical model that interprets the evaluation of ranking methods as a random experiment. This model predicts the variability of evaluation results, so that appropriate significance tests for the results can be derived. The paper concludes with an empirical validation of the model on a collocation extraction task.

متن کامل

Normalized (Pointwise) Mutual Information in Collocation Extraction

In this paper, we discuss the related information theoretical association measures of mutual information and pointwise mutual information, in the context of collocation extraction. We introduce normalized variants of these measures in order to make them more easily interpretable and at the same time less sensitive to occurrence frequency. We also provide a small empirical study to give more ins...

متن کامل

Combining Association Measures for Collocation Extraction

We introduce the possibility of combining lexical association measures and present empirical results of several methods employed in automatic collocation extraction. First, we present a comprehensive summary overview of association measures and their performance on manually annotated data evaluated by precision-recall graphs and mean average precision. Second, we describe several classification...

متن کامل

Multi-label Classification of Semantic Relations in German Nominal Compounds using SVMs

The current study compares lexical association measures for automatic extraction of Estonian particle verbs from the text corpus. The central focus lies on the impact of the corpus size on the performance of the compared symmetrical association measures. Additionally a piece of empirical evidence of the advantage of asymmetric association measure ΔP for the task of collocation extra...

متن کامل

A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on an iPad. The topic graph is constructed from N web snippets which are determined by a standard search engine. We consider the extraction of a topic graph as a specific empirical collocation extraction task where collocations are extracted b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005